mtcars Example CodebookHere, we’re just setting a few options.
knitr::opts_chunk$set(
warning = TRUE, # show warnings during codebook generation
message = TRUE, # show messages during codebook generation
error = TRUE, # do not interrupt codebook generation in case of errors,
# usually better for debugging
echo = TRUE # show R code
)
ggplot2::theme_set(ggplot2::theme_bw())
sessionInfo()
## R version 4.0.1 (2020-06-06)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows 10 x64 (build 18362)
##
## Matrix products: default
##
## locale:
## [1] LC_COLLATE=English_Canada.1252 LC_CTYPE=English_Canada.1252
## [3] LC_MONETARY=English_Canada.1252 LC_NUMERIC=C
## [5] LC_TIME=English_Canada.1252
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## loaded via a namespace (and not attached):
## [1] knitr_1.29 magrittr_1.5 tidyselect_1.1.0 munsell_0.5.0
## [5] colorspace_1.4-1 R6_2.4.1 rlang_0.4.6 stringr_1.4.0
## [9] dplyr_1.0.0 tools_4.0.1 grid_4.0.1 gtable_0.3.0
## [13] xfun_0.15 htmltools_0.5.0 ellipsis_0.3.1 yaml_2.2.1
## [17] digest_0.6.25 tibble_3.0.1 lifecycle_0.2.0 crayon_1.3.4
## [21] purrr_0.3.4 ggplot2_3.3.2 vctrs_0.3.1 glue_1.4.1
## [25] evaluate_0.14 rmarkdown_2.3 stringi_1.4.6 compiler_4.0.1
## [29] pillar_1.4.4 generics_0.0.2 scales_1.1.1 pkgconfig_2.0.3
Now, we’re preparing our data for the codebook.
library(codebook)
codebook_data <- mtcars
# omit the following lines, if your missing values are already properly labelled
codebook_data <- detect_missing(codebook_data,
only_labelled = TRUE, # only labelled values are autodetected as
# missing
negative_values_are_missing = FALSE, # negative values are missing values
ninety_nine_problems = TRUE, # 99/999 are missing values, if they
# are more than 5 MAD from the median
)
# add variable descriptions
var_label(codebook_data) <- list(
mpg = "Miles/(US) gallon.",
cyl = "Number of cylinders.",
disp = "Displacement (cu.in.).",
hp = "Gross horsepower.",
drat = "Rear axle ratio.",
wt = "Weight (1000 lbs).",
qsec = "1/4 mile time.",
vs = "Engine shape.",
am = "Transmission type.",
gear = "Number of forward gears.",
carb = "Number of carburetors."
)
val_labels(codebook_data$vs) <- c("V-shaped" = 0, "straight" = 1)
val_labels(codebook_data$am) <- c("automatic" = 0, "manual" = 1)
# Name of the dataset
metadata(codebook_data)$name <- "`mtcars` example dataset from the datasets package"
# description of the dataset
metadata(codebook_data)$description <- "The data was extracted from the 1974 Motor Trend US magazine, and comprises fuel consumption and 10 aspects of automobile design and performance for 32 automobiles (1973–74 models)."
# when was the data collected: ideally in ISO 8601 format
metadata(codebook_data)$temporalCoverage <- "1973/1974"
metadata(codebook_data)$citation <- "Henderson and Velleman (1981), Building multiple regression models interactively. Biometrics, 37, 391–411."
You can find other useful pieces of metadata which might help other make use of your data in the future at https://schema.org/Dataset
Create codebook
codebook(codebook_data)
## No missing values.
Dataset name: mtcars example dataset from the datasets package
The data was extracted from the 1974 Motor Trend US magazine, and comprises fuel consumption and 10 aspects of automobile design and performance for 32 automobiles (1973–74 models).
Metadata for search engines
Temporal Coverage: 1973/1974
Citation: Henderson and Velleman (1981), Building multiple regression models interactively. Biometrics, 37, 391–411.
Date published: 2020-07-26
|
#Variables
Miles/(US) gallon.
Distribution of values for mpg
0 missing values.
| name | label | data_type | n_missing | complete_rate | min | median | max | mean | sd | hist |
|---|---|---|---|---|---|---|---|---|---|---|
| mpg | Miles/(US) gallon. | numeric | 0 | 1 | 10 | 19 | 34 | 20.09062 | 6.026948 | <U+2583><U+2587><U+2585><U+2581><U+2582> |
Number of cylinders.
Distribution of values for cyl
0 missing values.
| name | label | data_type | n_missing | complete_rate | min | median | max | mean | sd | hist |
|---|---|---|---|---|---|---|---|---|---|---|
| cyl | Number of cylinders. | numeric | 0 | 1 | 4 | 6 | 8 | 6.1875 | 1.785922 | <U+2586><U+2581><U+2583><U+2581><U+2587> |
Displacement (cu.in.).
Distribution of values for disp
0 missing values.
| name | label | data_type | n_missing | complete_rate | min | median | max | mean | sd | hist |
|---|---|---|---|---|---|---|---|---|---|---|
| disp | Displacement (cu.in.). | numeric | 0 | 1 | 71 | 196 | 472 | 230.7219 | 123.9387 | <U+2587><U+2583><U+2583><U+2583><U+2582> |
Gross horsepower.
Distribution of values for hp
0 missing values.
| name | label | data_type | n_missing | complete_rate | min | median | max | mean | sd | hist |
|---|---|---|---|---|---|---|---|---|---|---|
| hp | Gross horsepower. | numeric | 0 | 1 | 52 | 123 | 335 | 146.6875 | 68.56287 | <U+2587><U+2587><U+2586><U+2583><U+2581> |
Rear axle ratio.
Distribution of values for drat
0 missing values.
| name | label | data_type | n_missing | complete_rate | min | median | max | mean | sd | hist |
|---|---|---|---|---|---|---|---|---|---|---|
| drat | Rear axle ratio. | numeric | 0 | 1 | 2.8 | 3.7 | 4.9 | 3.596563 | 0.5346787 | <U+2587><U+2583><U+2587><U+2585><U+2581> |
Weight (1000 lbs).
Distribution of values for wt
0 missing values.
| name | label | data_type | n_missing | complete_rate | min | median | max | mean | sd | hist |
|---|---|---|---|---|---|---|---|---|---|---|
| wt | Weight (1000 lbs). | numeric | 0 | 1 | 1.5 | 3.3 | 5.4 | 3.21725 | 0.9784574 | <U+2583><U+2583><U+2587><U+2581><U+2582> |
1/4 mile time.
Distribution of values for qsec
0 missing values.
| name | label | data_type | n_missing | complete_rate | min | median | max | mean | sd | hist |
|---|---|---|---|---|---|---|---|---|---|---|
| qsec | 1/4 mile time. | numeric | 0 | 1 | 14 | 18 | 23 | 17.84875 | 1.786943 | <U+2583><U+2587><U+2587><U+2582><U+2581> |
Engine shape.
Distribution of values for vs
0 missing values.
| name | label | data_type | n_missing | complete_rate | min | median | max | mean | sd | n_value_labels | hist |
|---|---|---|---|---|---|---|---|---|---|---|---|
| vs | Engine shape. | haven_labelled | 0 | 1 | 0 | 0 | 1 | 0.4375 | 0.5040161 | 2 | <U+2587><U+2581><U+2581><U+2581><U+2581><U+2581><U+2581><U+2586> |
| name | value |
|---|---|
| V-shaped | 0 |
| straight | 1 |
Transmission type.
Distribution of values for am
0 missing values.
| name | label | data_type | n_missing | complete_rate | min | median | max | mean | sd | n_value_labels | hist |
|---|---|---|---|---|---|---|---|---|---|---|---|
| am | Transmission type. | haven_labelled | 0 | 1 | 0 | 0 | 1 | 0.40625 | 0.4989909 | 2 | <U+2587><U+2581><U+2581><U+2581><U+2581><U+2581><U+2581><U+2586> |
| name | value |
|---|---|
| automatic | 0 |
| manual | 1 |
Number of forward gears.
Distribution of values for gear
0 missing values.
| name | label | data_type | n_missing | complete_rate | min | median | max | mean | sd | hist |
|---|---|---|---|---|---|---|---|---|---|---|
| gear | Number of forward gears. | numeric | 0 | 1 | 3 | 4 | 5 | 3.6875 | 0.7378041 | <U+2587><U+2581><U+2586><U+2581><U+2582> |
Number of carburetors.
Distribution of values for carb
0 missing values.
| name | label | data_type | n_missing | complete_rate | min | median | max | mean | sd | hist |
|---|---|---|---|---|---|---|---|---|---|---|
| carb | Number of carburetors. | numeric | 0 | 1 | 1 | 2 | 8 | 2.8125 | 1.6152 | <U+2587><U+2582><U+2585><U+2581><U+2581> |
JSON-LD metadata
The following JSON-LD can be found by search engines, if you share this codebook publicly on the web.
{
"name": "`mtcars` example dataset from the datasets package",
"description": "The data was extracted from the 1974 Motor Trend US magazine, and comprises fuel consumption and 10 aspects of automobile design and performance for 32 automobiles (1973–74 models).\n\n\n## Table of variables\nThis table contains variable names, labels, and number of missing values.\nSee the complete codebook for more.\n\n|name |label | n_missing|\n|:----|:------------------------|---------:|\n|mpg |Miles/(US) gallon. | 0|\n|cyl |Number of cylinders. | 0|\n|disp |Displacement (cu.in.). | 0|\n|hp |Gross horsepower. | 0|\n|drat |Rear axle ratio. | 0|\n|wt |Weight (1000 lbs). | 0|\n|qsec |1/4 mile time. | 0|\n|vs |Engine shape. | 0|\n|am |Transmission type. | 0|\n|gear |Number of forward gears. | 0|\n|carb |Number of carburetors. | 0|\n\n### Note\nThis dataset was automatically described using the [codebook R package](https://rubenarslan.github.io/codebook/) (version 0.9.2).",
"temporalCoverage": "1973/1974",
"citation": "Henderson and Velleman (1981), Building multiple regression models interactively. Biometrics, 37, 391–411.",
"datePublished": "2020-07-26",
"keywords": ["mpg", "cyl", "disp", "hp", "drat", "wt", "qsec", "vs", "am", "gear", "carb"],
"@context": "http://schema.org/",
"@type": "Dataset",
"variableMeasured": [
{
"name": "mpg",
"description": "Miles/(US) gallon.",
"@type": "propertyValue"
},
{
"name": "cyl",
"description": "Number of cylinders.",
"@type": "propertyValue"
},
{
"name": "disp",
"description": "Displacement (cu.in.).",
"@type": "propertyValue"
},
{
"name": "hp",
"description": "Gross horsepower.",
"@type": "propertyValue"
},
{
"name": "drat",
"description": "Rear axle ratio.",
"@type": "propertyValue"
},
{
"name": "wt",
"description": "Weight (1000 lbs).",
"@type": "propertyValue"
},
{
"name": "qsec",
"description": "1/4 mile time.",
"@type": "propertyValue"
},
{
"name": "vs",
"description": "Engine shape.",
"value": "0. V-shaped,\n1. straight",
"maxValue": 1,
"minValue": 0,
"@type": "propertyValue"
},
{
"name": "am",
"description": "Transmission type.",
"value": "0. automatic,\n1. manual",
"maxValue": 1,
"minValue": 0,
"@type": "propertyValue"
},
{
"name": "gear",
"description": "Number of forward gears.",
"@type": "propertyValue"
},
{
"name": "carb",
"description": "Number of carburetors.",
"@type": "propertyValue"
}
]
}`